首页> 外文OA文献 >The Wavelet Trie: Maintaining an Indexed Sequence of Strings in Compressed Space
【2h】

The Wavelet Trie: Maintaining an Indexed Sequence of Strings in Compressed Space

机译:Wavelet Trie:维护一个索引的字符串序列   压缩空间

摘要

An indexed sequence of strings is a data structure for storing a stringsequence that supports random access, searching, range counting and analyticsoperations, both for exact matches and prefix search. String sequences lie atthe core of column-oriented databases, log processing, and other storage andquery tasks. In these applications each string can appear several times and theorder of the strings in the sequence is relevant. The prefix structure of thestrings is relevant as well: common prefixes are sought in strings to extractinteresting features from the sequence. Moreover, space-efficiency is highlydesirable as it translates directly into higher performance, since more datacan fit in fast memory. We introduce and study the problem of compressed indexed sequence of strings,representing indexed sequences of strings in nearly-optimal compressed space,both in the static and dynamic settings, while preserving provably goodperformance for the supported operations. We present a new data structure for this problem, the Wavelet Trie, whichcombines the classical Patricia Trie with the Wavelet Tree, a succinct datastructure for storing a compressed sequence. The resulting Wavelet Triesmoothly adapts to a sequence of strings that changes over time. It improves onthe state-of-the-art compressed data structures by supporting a dynamicalphabet (i.e. the set of distinct strings) and prefix queries, both crucialrequirements in the aforementioned applications, and on traditional indexes byreducing space occupancy to close to the entropy of the sequence.
机译:索引的字符串序列是一种数据结构,用于存储支持完全匹配和前缀搜索的字符串序列,该序列支持随机访问,搜索,范围计数和分析操作。字符串序列是面向列的数据库,日志处理以及其他存储和查询任务的核心。在这些应用程序中,每个字符串可以出现几次,并且字符串在序列中的顺序是相关的。字符串的前缀结构也很重要:在字符串中寻找通用前缀以从序列中提取有趣的特征。此外,空间效率非常理想,因为它可以直接转化为更高的性能,因为更多的数据可以容纳在快速存储器中。我们介绍并研究了压缩的索引字符串序列的问题,它表示在静态和动态设置下几乎最佳的压缩空间中的字符串的索引序列,同时为支持的操作保留了可证明的良好性能。我们提出了针对此问题的新数据结构,即Wavelet Trie,它将经典的Patricia Trie与Wavelet Tree相结合,Wavelet Tree是用于存储压缩序列的简洁数据结构。产生的Wavelet Triesmoothly适应了随时间变化的字符串序列。通过支持动态字母(即一组不同的字符串)和前缀查询(这是上述应用程序中的关键要求)和传统索引(通过减少空间占用以使其接近熵),它改进了最新的压缩数据结构。序列。

著录项

  • 作者单位
  • 年度 2012
  • 总页数
  • 原文格式 PDF
  • 正文语种 {"code":"en","name":"English","id":9}
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号